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Abstract —It is shown that the capacity of the channel modeled 
hy (a discretized version of) the stochastic nonlinear Schrodinger 
(NLS) equation is upper-hounded by log(l + SNR) with SNR = 
Vo /cr^ {z), where Vo is the average input signal power and a^(z) is 
the total noise power up to distance z. The result is a consequence 
of the fact that the deterministic NLS equation is a Hamiltonian 
energy-preserving dynamical system. 

I. Introduction 

Half a century after the introduction of the optical hber, 
the problem of determining its capacity remains open. This 
holds even for the single-user point-to-point channel subject 
to a power and bandwidth constraint. There is also a lack of 
general upper bounds, as well as lower bounds in the high- 
power regime. The asymptotic capacity when power V ^ co 
is also unknown. 

Numerical simulations of the optical hber channel with 
additive white Gaussian noise (AWGN) seem to indicate that 
the data rates that can be achieved using current methods 
are below log(l + SNR), the capacity of an AWGN channel 
with signal-to-noise ratio SNR. In this paper, we prove this 
conjecture, namely, we show that 

C ^ log(l + SNR), (1) 

where SNR(z) = Vo/a'^{z), in which Vq is the average input 
signal power and cr^(z) is the total noise power up to the 
distance z. Here C is the capacity of the point-to-point channel 
per complex degree-of-freedom. 

Motivated by recent developments suggesting that the non¬ 
linearity can be constructively taken into account in the 
design of communication schemes to potentially address the 
capacity bottleneck problem in optical hber lU-ISl, it has been 
speculated that data rates above log(l + SNR) may even be 
achievable. While the nonlinearity can be exploited, as for 
instance in 01-0, the upper bound ([TJ shows that it does not 
offer any gain in capacity relative to the linear channel. All one 
can hope for is to embrace nonlinearity in the communication 
design so that it does not penalize the capacity at high powers. 
This is expected in the (closed) conservative system (|2]i, which 
does not include any gain (amplihcation) mechanism. 

Throughout this paper, lower and upper case letters rep¬ 
resent, respectively, deterministic and random variables. Row 
vectors are denoted by underline, e.g., Q" = (Qi,- • • ,Qn)- 
As usual, R, resp. C, denotes the set of real, resp. complex, 
numbers. The imaginary unit is denoted by j = s/—!. 


II. Continuous-time Channel Model and Its 
Discretization 

Let Q{t, z) : M X R+ C be a function of time t and 
space z. Signal propagation in optical fiber is described by the 
stochastic nonlinear Schrodinger (NLS) equation in Eq. 3] 

jd,Q = dttQ + 2\Q\'^Q + W{t,z). (2) 

Here W {t, z) is space-time white circularly symmetric com¬ 
plex Gaussian noise with constant power spectral density CTq 
and bandlimited to [—H/2,S/2], i.e., 

E {W{t, z)W*{t\ zO) = - t')S{z - z'), 

where 6b{x) = Bsmc{Bx), sinc(a:) = sin(7rx)/(7ra::), and 
5{x) is the Dirac delta function. The transmitted signal power 
is limited so that 

r /2 

lim e 2 r |Q(f,0)pdf sS T’o- (3) 

r^co 7 J 
-r /2 

We discretize the continuous-time model (|2]l by consider¬ 
ing the partial differential equation (PDE) (|2|l with periodic 
boundary conditions 

Q{t + T, z) = Q(t, z), yt,z, 

where T is the signal period. Substituting the two-dimensional 
Eourier series (see ||7] Sections III and V]) 

00 

Q(f,z)= 

fc =—00 

into the NLS equation (|2]i, we obtain 

dzQk{z) = -2j ^ e^^‘'^”‘'^^Ql{z)Qmiz)Ql{z)dlmnk 

Imn 

+ Wk{z), (4) 

where Simnk = 6[l + m — n — k], (5[fc] is the Kronecker delta 
function, and 

^Imnk = -f - fc^), Wo = 2tt/T. 

We assume that T ^ oo so that the discrete model dl 
captures the inhnitely many signal degrees-of-freedom in the 
continuous model © in a one-to-one manner. As a result. 


Wk are uncorrelated circularly symmetric complex Gaussian 
random variables, with 

E {Wu{z)W*,{z')) = foal5[k - k']6{z - z'), fo ^ 

The coupled stochastic ordinary differential equation (ODE) 
system © defines a discrete vector communication channel 
in the frequency domain (3"(0) Q'^{z). For notational 

convenience, we limit to positive frequencies so that vector 
indices start from one. We denote the action of the stochastic 
ODE system © on input (5"(0) by Sz, i-e., Q"^{z) = 
Sz{Q^{0))- We denote the action of the deterministic (noise¬ 
less) system (where = 0) on input Q"'(0) by T^, i.e., 
Q"^(z) = Tz(Q^(0)). The power constraint © is discretized 
to 7^(0) ^ Po, where 

n 

r{z) ^ Y. mk{z)?- (5) 

In this paper, we assume n = m and study the capacity of the 
discretized channel Sz, instead of the original continuous-time 
channel ©. See Remark [T] for the case n ^ m. 

The upper bound © on the capacity of Sz is obtained as 
follows. The transformation Tz is energy-preserving, implying 
that the output power in Sz is T’(O) -f cr'^{z), cr^(z) = 
BagZ. Consequently, the output (differential) entropy rate 
is upper-bounded, from the maximum entropy theorem, by 
Cn + log(T’o + Cn = log(7re/n). For the conditional 

entropy, note that noise is added continuously along the 
link. The entropy power inequality (EPI) implies that the 
conditional entropy rate is not less than the (overall) noise 
entropy rate Cn + log((T^(z)). Combining these two results, 
C ^ log(l + SNR). In what follows, we establish these two 
steps. 

The use of the EPI in bounding the conditional entropy 
rate is an important step in our proof. It is therefore worth 
elaborating on the EPI briefly, to see why entropy should 
increase at least by a constant amount at each point that noise 
is added along the link. In Appendix lAl we briefly review this 
interesting inequality. 

III. Upper Bound 
A. Upper Bound on the Output Entropy 

Lemma 1 (Monotonicity of the Power in Sz)- Let B be the 
common signal and noise (passband) bandwidth from input to 
output. The output average power in Sz is 

V{z) = V{Q) + a'^{z). (6) 


Proof: Since the signal and noise are commonly bandlim- 
ited to B, Qk and Wk are supported in 1 ^ fc ^ n for all z, 
n = B/fo. Taking the derivative with respect to z in © and 


using ©, we obtain 

= 43 ( ^ 

Imnk 

n 

+ YHQtWk + QkW*) 

k=l 

n 

= E{Qt{z)Wk{z) + Qk{z)W*{z)), (7) 

k=l 

where we used the fact that the nonlinear term is real-valued, 
since Elimnk = —Elnkim- We now integrate © in distance. 
From ©, Qk{z) contains a term depending on Wk{l), I < z, 
and a Brownian motion term Bk{z) = Wk{l)dl. The first 
term is independent of Wk{z)', from the second term we get 

Z Z 

E(J {Ql{l)Wk{l) + C.c.) dl) = E(J iB*il)dBkil) + c.c.)) 

0 0 

= ^Bk{z)\^ 

= focrlz, 

where c.c. stands for complex conjugate. Summing over 1 ^ 
fc ^ n, we obtain ®. ■ 

Using Lemma [T] the output entropy rate can be upper 
bounded as follows: 

-klQ'^lz)) sS -log((7re)"detA:(2;)) 
n ~ n 

= log7re-l—log (det iT( 2 ;)) 

n 

s: log7re + - V log {Kkk{.z)) 

^ k=l 

/X 1 n 

^ log7re-f - V log(E|Qfc(z)p) 

^ k=l 

sg log7re-f log(^-E 2 \Qk{z)\^'^ 

k = l 

= logTre-f log('P(z)/n) 

W Cn + log [Vo + (j‘^{z )), (8) 

where K{z) > 0 is the covariance matrix of Q'^{z) with 
entries Kki{z). Step (a) is due to the maximum entropy 
theorem. Step (6) follows from Hadamard’s inequality. For 
step (c), note that in ©, power was defined as average energy 
in time interval T divided by T. As a result, a non-zero 
constant signal has non-zero power. In the covariance matrix, 
in contrast to © and ©, the mean of the random variable is 
subtracted as Kkk{z) = E\Qk{z)\’^ — \E{Qkiz))\'^. Unlike ©, 
the mean term '£^=i \^Qk(z)\'^ is not preserved in the noise- 
free channel. Furthermore, a zero-mean signal at the input may 
not have zero mean at z > 0. Nevertheless, step (c) holds since 
Kkk{z) < E|(3fc(z)p. Steps (d) and (e) follow, respectively, 
from the concavity of the log function and ©. In steps (6), 
(c) and (e), we also used the fact that log is an increasing 
function. 



B. Lower Bound on the Conditional Entropy 

Lemma 2 (Volume Preservation in T^). Let O = p) be 

a measure space, where I? = jg" | ^ IftP < 00 } and 

n 

n 

fc=i 

is the Lebesgue measure. Transformation Tz, as a dynamical 
system on LI, is measure-preserving. That is to say 

p{Tf\A)) = p{A), yAeS. 


p{A) = vol(A) = J 




Proof: We note that, when M4 = 0, the ODE system (|4]i 
is Hamiltonian, i.e., it permits an alternative formulation 


dH . dH 

) Vk j 

oyk oxk 

where dot represents dz , {xk,yk) = 
tonian function H is given by 


fc = 1, • • • ,n, (9) 

{qkjQk) Hamil- 


n 

H{x^,y^) = JYj ^1^‘^^kyk 

k=l 

n 

- j ^ XaXbycyde^^°-'”='^^ Sabcd- 

abcd=l 


Liouville’s theorem asserts that Hamiltonian systems preserve 
the Lebesgue measure [jS]. This is indeed easy to see. Let 
= 0^=1 dxfedt/fe. Then 


d/i = (diidyi + dxidt/i) n dxkdyk H- 

k=2 


= 0 , 


where we substituted (|9|l. It follows that Tz is a volume¬ 
preserving transformation (in the sense of ergodic theory 0)- 


Lemma 3 (Entropy Preservation in Tz). The flow of Tz is 
entropy-preserving, i.e., h(Tf^{Q'^)) = h{Q'^). 

Proof: Prom Lemma |2] is a measure-preserving 
transformation; therefore it has unit (determinant) Jacobian, 
det J = 1, where J is the K 2 nx 2 ra Jacobian matrix. Since Tz 
is also invertible 

h{Tf\g^)) = h{Q^) - Elog I det J| = /i(Q”). 

Note that with J as a C"^" matrix, there would be a factor 
2 in front of the log. ■ 

In the example of the NLS channel (|2]l, the dispersion and 
nonlinear parts are separable and can be solved in simple 
forms. In such examples, it might be possible to directly check 
that the flow of the equation has unit Jacobian. Note that the 
dispersion operator, being a unitary transformation, has unit 
Jacobian. One can also verify that the nonlinear part of the 
NLS equation (|2]) has unit Jacobian too. Consider 

r = Xexp(j7(|X|)), X,YeC, (10) 


for any differentiable function f{X). In (|2|i, f{X) = zX'^, 
X = (5(f, 0) and Y = Q{t,z). Linearizing at X = 0, dY = 
dX. More formally, in polar coordinates 

Ry=Rx, +/(i?x), 

where {Rx, and {Ry, $y) are coordinates of X and Y, 
respectively. Clearly det J = 1, which can be seen is the same 
in the Cartesian coordinates because |y| = |X|. Since the 
transformation from the NLS equation (|2]l in the time domain 
to the ODE system (|4|i in the discrete frequency domain is 
also unitary and unit Jacobian, Tz has unit Jacobian. 

Pinally, it is also possible to check that Tz is entropy¬ 
preserving using the elementary properties of the entropy. It 
is obvious that the dispersion operator is entropy-preserving. 
In the continuous model dU, the nonlinear transformation in 
each time sample is given by (fTOl i. Using the chain rule for 
entropy 


h{RY,^Y) = h{RY) -f h ($Y|i?y) 

= h{Rx) + h(^^x + fiRx)\Rx^ 

= h{Rx) + h^x\Rx) 

= h{Rx,<^x)- 

Note that the entropy of a complex random variable is de¬ 
fined as the joint entropy of the real and imaginary parts. 
Changing variables to the Cartesian coordinate system shifts 
the entropy by Elogjdet J| = Elogi?y = Elogi?x- Thus 
exp(j$Y)) = h{Rx ex-p{j^x))■ The result also holds 
for the vector version of (fTOl l as well. Because the Lourier 
transform is also entropy-preserving, so is Tz. 

The last two approaches, however, depend on details of the 
example at hand. Lor some equations the nonlinear part is not 
an additive term to dispersion, and even if it is, it may not be 
simply solvable like (fTol i. Lor instance, the nonlinear part of 
the Korteweg-de Vries (KdV) equation is Burgers’ equation, 
which is not easily solvable as Coll, so as to examine entropy 
preservation directly. However, it is quite easy to show that 
the KdV equation, and indeed a large number of evolution 
equations, are Hamiltonian. 

Lemma 4 (Monotonocity of the Entropy in Sz). The condi¬ 
tional entropy rate in Sz is lower-bounded by the noise entropy 
rate, i.e., 

-hiSziQ'^miQ'^m + \ 0 ga^{z). 

n - - 

Proof: In a small interval Az in (IHi 

S,+AziQ^{z)) = TA.iSziQ'^iz))) + IU"(z)V^. (11) 

The two terms in the right hand side of (fTTI) are independent. 
Applying the EPI (fT4l i. 

2iHS.+AAg"')) ^ 


where the last step follows because, from Lemma [3 is 
entropy-preserving and is Gaussian. Given (5"(0) = 
( 7 "( 0 ), we integrate in z to obtain 

2^MS.(9"(o))) ^ 4- 2'^"cr^(z) 

= 2^’'a^{z). 

It follows that 

i/r(Q"(z)|Q'‘(0)) >Cn + log{a^{z)). (12) 

n - 

■ 

Combining ([8]l and (O, we bound the mutual information 

i/(Q"(0);Q"(z))<logfl + ^). 

n - - V iz)J 

Noting that the right hand side is independent of the input 
distribution, we obtain the upper bound ([T]l. 

Remark 1 (Spectral Efficiency (SE) in the Case n m). 
In this paper, we did not introduce filters into the model. 
Any potential filtering at the receiver (possibly due to spectral 
broadening) can only decrease the mutual information (by the 
chain rule). Eurthermore, let B{z) be the bandwidth at distance 
z, according to a certain definition. Since B{0) sg max^ B(z), 
normalizing by maxz B{z) would only decrease the SE rela¬ 
tive to the linear dispersive channel (where B{z) = B{0)). In 
summary, nonlinearity is entropy-preserving and the effect of 
its spectral broadening does not increase data rate or the SE. 
The upper bound ([T]i on the SE holds if n ^ m. 

Throughout the paper, we assumed that noise bandwidth 
is larger than the signal bandwidth. Otherwise, capacity can 
be (nearly) unbounded by exploiting the (nearly) noise-free 
frequency band. □ 

The upper bound ([T]l is indeed simple. In this paper, we 
discussed it in the context of a general Hamiltonian channel 
with continuous evolution. In particular, it also holds for 
a discrete concatenation of energy- and entropy-preserving 
systems with additive white Gaussian noise. 

A different account of the upper bound ([T]i is given in ifTOl 
using the split-step Eourier method. 

IV. Conclusion 

It is shown that the capacity of the point-to-point op¬ 
tical fiber channel, modeled via the stochastic nonlinear 
Schrodinger equation (|2]), and subject to a power and band¬ 
width constraint, is upper-bounded by log(l + SNR). 
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Appendix A 

The Entropy Power Inequality 

Lemma 5 (Entropy Power Inequality). Let X,Y e K” be 
independent random variables. Define the entropy power of a 
random variable X e R” as 

^eiX) = (13) 

ZTre 

Then 

al{X + Y)^al{X) + al{Y). (14) 

Equality holds if and only if X and Y are Gaussian with 
proportional covariance matrices. 

Proof: By now there are many proofs of the EPI. A simple 
proof is given in M Section 17.8]. It can be explained as 
follows. 

Consider n = 1. We are looking for an inequality involving 
the convolution fx{x) /y(j/)- The well-known Young’s 
inequality for fxix) e L^(R.) and /y(y) e states 

Wfxix) * fY{y)L ^ c WfxixX \\fyiy)\\,, (i5) 

where 1/p+l/q = 1/a+l (p, g, a > 1), and C = y/CpCnjCa, 
1 1 

Cx = where x'is conjugate to X, i.e., Ijx+llx' = 1. 

When p, q ^ 1, the equality holds if and only if fx{x) 
and friy) are Gaussian. On the other hand, entropy and 
norm of a probability density fx{x) are related via h{X) = 
—5a log ||/x(x) 11“ at a = 1. However differentiating both 
sides of an inequality does not preserve the sense of the 
inequality. Nevertheless, using L’Hopital’s rule we can convert 
differentiation to a limit 

h{X) = lim--^log||/x(x)||“. 
ail i — a 

This in turn gives (Te(X) = limau |l/x(x) At a = 

1 4- e (e ^ 0), the left side of (fl^ gives ae^X 4- Y). Eor a 
given a, there is one free parameter in the right hand side of 
(El. By choosing the free parameter such that the right side 
of (El is maximized, we obtain the EPI. The case n > 1 
is obtained by replacing entropy with entropy rate (and using 
a version of (El in R" to find conditions of equality). The 
equality in (El results from the equality in (El . 

The EPI, in some sense, is the derivative of the Young’s 
inequality. ■ 

Several remarks are in order now. 

a) Bound on conditional discrete entropy: Let A and B 
be finite discrete sets (alphabets). Since not all elements of 
A + B are distinct, we have the sumset inequality 

yiA + B)^p{A)p{B), (16) 

where p, denotes set cardinality. This in turn gives 

H{X + Y) H{X) + H{Y), (17) 

where X and Y are independent discrete random variables 

taking values, respectively, in alphabets A and B, and H 
is discrete entropy. Eor uniform random variables (E21l is 





just the sumset inequality ( fTSI ): non-uniform distributions 
can (almost) be converted to uniform distributions via the 
asymptotic equipartition theorem CD. The inequality (fTTl i 
reflects the fact that the sum of independent discrete random 
variables typically does not tend to a uniform random variable 
(maximum entropy). In fact, in a sense, X -f F is “less 
uniform” than X and Y. In sharp contrast, the (normalized) 
sum of independent continuous random variables tends to a 
Gaussian random variable (maximum entropy)—^however, the 
increase in randomness is measured in entropy power, not the 
entropy itself. 

The inequality (fTTl) seems to indicate that as noise is added 
along the optical fiber, the conditional entropy of the signal 
does not increase. Two distinct pairs {qi,Wi) and ( 921 ^ 2 ) 
can have the same sum 9 ” -I- w" = 92 + If 2 ’ making Q" -I- IF" 
potentially “less random”, so to speak. This is, however, true 
only in a discrete-state model in which 9 " is quantized in 
a finite set. It follows that, the entropy bounds in this paper 
may not be valid in discrete-state models, due to important 
differences between the differential and discrete entropies. 
This difference stems from the properties of the cardinality 
(volume) in discrete (continuous) sets. 

b) Growth of the effective variance in evolution: For a 
Gaussian random variable with variance a^, <J^{X) = cr^. 
Thus one may think of <j1{X) as the effective variance of X 
or the squared radius of the support of X (hence the notation). 

A family of fascinating metric inequalities analogous to 
(IT 4 I 1 exist in geometry and analysis, where the squared radius 
(foi l is defined differently ifOll . Notably, in one of its facets, 
the Brunn-Minkowski inequality (BMI) for compact regions 
A, i? c R" states 

/i" (A-f B) > p." (A) -I- /i" (S), (18) 

where p is the Lebesgue measure (volume) and A + S is the 
Minkowski sum of A and B. The BMI looks like the EPI with 
it‘1{X) = pn(A). Let A™, S’" and C™ be, respectively, the 
e-typical sets of random variables X, F e R" and Z = X + Y . 
From the concentration of measure or 

p^(Ar) ^ as e 0. Applying the BMI to A™ 

and S™, we obtain the EPI with factor one in the exponent 
in (foi l instead of two. The result is not the desired EPI 
inequality. This is because C™ A A™ + S™. In fact, again 
from the concentration of measure, A™ concentrates on a 
smaller set C™ c A™ + Bf, i.e., < p(A™)p(S’"). 

To obtain the EPI from the BMI, and thus to give the EPI 
a geometric meaning, we need a probabilistic version of the 
Minkowski sum, where the volume is defined as the size of 
high probability sequences. Define the fl-restricted Minkowski 
sum of two sets A,Bcz R" 

A-fn S = ja-f &| (a,&) G c A X b|. (19) 

The restricted BMI states that, if p(n) 5= (1 - 6UA)KB) 
for some d > 0 , then (fTsl l holds but with exponent 2/n 113 
Theorem 1.2, with large n\. Furthermore, the restricted BMI 


is sharp, regardless of how close 17 is to A x B, i.e., as <5 ^ 0. 
That is to say, even a small uncertainty in the size of A x B 
would increase the exponent in the BMI by a factor of two. The 
inequality is best seen for Gaussian random variables where 
typical sets can be imagined as spherical shells Ids ■ Applying 
the restricted BMI to A™ and B™ with 17 = {(a, &) | a G 
A™, b G B™, a + be C™}, we successfully obtain the EPI. 

With the geometric interpretation of the BMI for typ¬ 
ical sequences, the upper bound dD is trivial. The out¬ 
put typical set A™((5"(z)) is covered in the sphere 
S 2 n 7 n{qciiz), ■\/m{Vo + cr2(z)). Centered at some 9 "^(z). For 
a particular input sequence 9 "( 0 ), as the typical set of the 
signal and noise are overlapped in the optical link, the resulting 
region can be packed by a sphere S 2 nm{q]l^ (z), ma^(z)), 

centered at some qc.^{z). The capacity sphere-packing inter¬ 
pretation gives 

Inequalities in the family to which (fT3l and (fT3 l belong 
appear intimately connected; however, it seems difficult to 
deduce them all from one master inequality, due to important 
differences among them. There is substantial work on this type 
of inequality; see lfT3l and references in ifTTIl . 
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